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1.  Description  of  a  desirable  model 

Let  us  suppose  that  we  are  investigating  a  system  whose  state  can  be  ade¬ 
quately  specified  by  n  real  numbers  •  •  •  ,  a:".  We  shall  suppose  that  by  some 
acceptable  scientific  theory  it  is  predicted  that,  in  the  absence  of  disturbances 
from  outside  the  system,  the  develop  in  time  in  accordance  with  certain 
differential  equations, 

(1.1)  =  g'dit,  x),  i  =  1,  •••,  n. 

If  there  are  disturbances  or  noises,  n^(t),  •  •  •  ,  n'’(Q,  the  underlying  theory  of 
such  systems  will  often  permit  us  to  conclude  that 

r 

(1.2)  i:'  =  Sr'o(f  x)  -F  ^  gr;(f,  x)w^(/),  i  =  1,  •  •  •  ,  w, 

p=i 

where  is  the  sensitivity  of  the  ith  coordinate  to  the  pth  noise.  However  in  the 
underlying  theory,  equation  (1.2)  will  usually  have  a  limited  domain  of  applica¬ 
bility  ;  in  particular,  we  could  not  usually  retain  confidence  in  the  trustworthiness 
of  (1.2)  if  the  noise  were  unbounded.  But  for  sufficiently  well-behaved  bounded 
noises  we  can  rewrite  (1.2)  in  the  form 

(1.3)  dx'  =  go(t,  x)  dt  +  Y,  g^it,  x)  dz'', 

p 

or 

(1.4)  x'{t)  =  -F  r  a:(s)]  ds  +  Z  f  9p[s,  xis)]  dz''{s), 

Ja  p  ^  ^ 

where 

(1.5)  z^{t)  =  2:^(a)  -F  r  n^{s)ds; 
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with  bounded  rf  or  Lipschitzian  these  are  solvable  by  traditional  methods,  and 
(perhaps  with  still  stronger  requirements  on  the  z^)  will  describe  the  evolution 
of  the  system  with  as  much  certainty  as  the  underlying  scientific  theory  of  such 
systems  permits. 

Usually  however,  we  are  interested,  not  in  the  response  of  the  system  to 
specified  noises  z^,  but  in  statistical  properties  of  the  responses  of  the  system  to 
random  noises.  As  is  well  known,  this  causes  a  dilemma.  The  processes  z^  most 
amenable  to  probabilistic  study  are  martingales,  especially  the  Wiener  process 
and  closely  related  processes.  But  these  have  almost  surely  non-Lipschitzian 
sample  functions  and  lie  outside  the  domain  of  applicability  of  the  scientific 
theory  that  led  to  (1.4).  The  integrals  with  respect  to  z^  in  (1.4)  cannot  even  be 
interpreted  as  Riemann-Stieltjes  or  Lebesgue-Stieltjes  integrals.  Interpreting 
them  as  Ito  integrals  restores  meaning  to  all  terms  in  (1.4),  but  gives  no  ground 
for  confidence  that  the  solution  (1.4)  will  continue  to  represent  the  time  develop¬ 
ment  of  the  system.  It  is  a  familiar  fact  that  the  uncritical  use  of  (1.4)  can  lead 
to  mismatches  between  system  and  model  that  are  often  considered  paradoxical. 

E.  Wong  and  M.  Zakai  have  made  a  major  contribution  [8],  [9]  to  the  removal 
of  these  “paradoxes.”  Suppose  that  we  are  studying  a  system  which,  for 
Lipschitzian  disturbances  zf,  is  governed  by  (1 .4)  with  w  —  r  =  1 .  For  notational 
simplicity  we  omit  the  superscripts  on  x^,  zF,  g^,  and  so  forth.  Let  z  be  a 
Brownian  motion  process  on  an  interval  [a.  6].  Let  tt  be  a  finite  set  of  numbers 

■  ■  ■  >  ^k+  1 

(1.6)  (I  =  U  <  ^2  ^k+l  ~ 

and  let  Z  be  the  process  whose  sample  paths  coincide  with  those  of  z  at  the  tj 
and  are  linear  between  them.  Then  the  solutions  X  of  (1.4)  with  Z  in  place  of  z. 
that  is,  the  solutions  of  the  ordinary  equations 


(1.7)  X{t)  =  +  f  sro[s,  ^(«)]  +  f  Gib,  Xis)]  dZ{s), 

J  a  J  a 


are  random  variables;  and  as  the  mesh  of  n  (that  is,  max  [^j+i  —  tends  to 
0,  the  X  converge  in  quadratic  mean  to  a  limit  x.  But  this  limit  is  not  the  solution 
of  (1.4),  but  of 


(1.8) 


x{t)  =  a:o  + 


gro[s,  a:(s)]  ds  + 


't 

griEs,  xis)]  dz{s) 

Ja 


+  ^  J  ds. 

(Wong  and  Zakai  have  also  established  this  for  a  more  general  class  ol  disturb¬ 
ances  than  Brownian  motion  processes,  see  [9].) 

These  results  of  Wong  and  Zakai  show  us,  at  least  in  some  important  cases, 
how  to  model  systems  affected  by  noise.  If  for  Lipschitzian  disturbances  the 
system  evolves  according  to  (1.4)  (with  subscripts  and  superscripts  suppressed), 
then  if  the  physically  admissible  Lipschitzian  distances  are  idealized  to  Brownian 
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motion  processes  equation  (1.4)  should  be  replaced  by  (1.8).  If  this  is  done,  the 
solution  of  (1.8)  will  be  close  in  quadratic  mean  to  the  solutions  of  (1.4)  for  at 
least  some  Lipschitzian  disturbances  with  finite  dimensional  distributions  close 
to  those  of  the  Brownian  motion  idealization. 

Nevertheless,  it  is  at  least  inconvenient,  as  well  as  aesthetically  unsatisfying, 
to  have  different  equations  for  different  types  of  disturbances.  It  would  be  prefer¬ 
able  to  have  a  theory  of  integration  that  would  apply  both  to  processes  with 
Lipschitzian  sample  functions,  to  Brownian  motion  and  to  other  martingales 
that  have  so  often  proved  useful;  and  correspondingly,  it  would  be  preferable 
to  have  a  method  of  modeling  systems  that  is  consistent  with  the  basic  model 
(1.4)  when  the  disturbances  are  Lipschitzian  and  gives  “nearly’'  the  same  result 
when  a  Lipschitzian  disturbance  is  replaced  by  a  martingale  type  idealization 
that  is  in  some  reasonable  sense  “close”  to  it.  More  specifically,  we  shall  seek 
to  replace  (1.4)  by  another  set  of  so  called  differential  equations  (really  integral 
equations)  with  the  following  desirable  properties. 

(a)  Inclusiveness.  The  integrals  in  the  equations  should  be  defined  for  some 
recognizable  class  of  processes  2^,  large  enough  to  include  all  processes  with 
Lipschitzian  sample  paths  and  also  to  include  all  Brownian  motion  and  such 
modifications  of  Brownian  motion  as  have  been  useful  in  applications. 

(b)  Consistency.  For  Lipschitzian  disturbances,  the  solutions  of  the 
equations  should  coincide  with  the  solutions  of  the  equations  (1.4)  that  are 
given  to  us  (for  smooth  disturbances)  by  the  scientific  theory  of  the  system. 

(c)  Stability.  This  property  is  not  easy  to  describe  precisely.  Suppose  that 
we  have  introduced  some  sort  of  topology  in  the  space  of  random  processes, 
so  that  the  convergence  of  a  sequence  of  processes  Zi.  Z2  '  ‘  ‘  to  a,  limit  process 
2  is  meaningful  and  is  in  principle  experimentally  verifiable,  with  the  customary 
allowance  for  experimental  error.  Then,  under  unexcessive  restrictions,  if 
processes  (zj,---,  Zj)  converge  to  (2^  •  •  •  ,  2''),  the  solutions  {x},  •  •  •  ,x]) 
corresponding  to  the  2^  should  also  converge  to  the  solutions  (xh  •  •  •  ,  x") 
corresponding  to  the  limit  (2^  •  •  •  ,  2'').  As  a  special  case,  if  n  =  r  —  1,  the 
solution  of  the  equation  when  2  is  Brownian  motion  should  coincide  with  the 
solution  of  Wong-Zakai  equation  (1.8). 

In  order  to  develop  such  a  theory,  we  must  define,  for  a  class  of  processes  with 
the  inclusiveness  property  (a)  the  types  of  integrals  needed  in  the  equations;  we 
must  develop  a  calculus  for  these  integrals  that  will  permit  us  to  study  differential 
equations;  we  must  specify  the  differential  equations  of  our  model;  we  must 
show  that  these  differential  equations  are  solvable;  and  we  must  show  that  their 
solutions  possess  the  consistency  property  (b)  and  the  stability  property  (c). 
The  remainder  of  this  paper  is  an  outline  of  the  steps  in  this  program. 

During  the  Sixth  Berkeley  Symposium,  I  had  the  pleasure  and  profit  of  several 
conversations  with  Professor  Eugene  Wong.  In  particular,  the  present  version  of 
Theorem  9.1  owes  its  existence  to  his  tactfully  expressed  dissatisfaction  with  an 
earlier  version  in  which  weaker  conclusions  were  drawn  from  stronger 
hypotheses. 
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2.  Definition  of  the  integral 

To  avoid  repetition,  we  henceforth  suppose  that  (Cl,  P)  is  a  probability 
triple,  that  T  is  a  set  of  real  numbers,  that  [a,  6]  is  a  closed  interval  contained 
in  T,  and  also  that 

(2.1)  /  =  (/(t,  (d)  :  z  e  T,  oj  e  Cl),  Z'"  =  (z*(^  co)  :  t  e  [a,  6],  co  e  Cl), 

k  =  1,  •  •  •  ,  g',  are  real  stochastic  processes  on  T  and  on  [a,  6],  respectively.  By 
a  partition  of  [a,  6]  (with  evaluation  points  in  T)  we  shall  mean  a  finite  set 

(2.2)  n  =  (Ij,  •  •  •  ,  //+1 ;  Tj,  •  •  •  ,  T^) 
of  real  numbers  such  that 

(2.3)  a  =  ^  ^2  =  ■  ■  ■  =  h+i  —  ^ 

and  ZiG  T,i  =  1 ,  •  •  •  ,  /.  The  1,-  are  called  the  division  points  of  11 ,  and  the  t,  the 
partition  points  of  11.  (We  usually  omit  the  words  “with  evaluation  points  in  TP) 
Apart  from  notation,  the  partitions  n  with  t,  =  were  used  one  hundred 
and  fifty  years  ago  by  Cauchy  to  define  the  integral  of  a  continuous  function ; 
so  we  shall  call  them  Cauchy  partitions.  Partitions  with  t,-  in  [l,-,  f,+  i]  for  each  i 
were  used  by  Riemann,  and  we  shall  call  them  Riemann  partitions.  But  for  use 
with  stochastic  processes  it  proves  highly  advantageous  to  use  partitions  such 
that  ti  ^  Zi,  i  =  1,  •  •  •  ,  /,  and  these  we  shall  call  belated  partitions. 

If  n  is  a  Cauchy,  or  Riemann,  or  belated  partition,  with  notation  (2.2),  we 
define 

(2.4)  mesh  11  =  max  {tj+i  —  min  {tj,  Zj] :  j  =  1,  •  •  •  ,  /}. 

Corresponding  to  the  processes  (2.1)  and  the  partition  (2.2),  we  define  the 
Riemann  sum  *S(  n ;/,  •  •  •  ,  z^)  to  be  the  random  variable  (r.v.)  whose  value 

at  m  (in  Cl)  is  given  by 

^  C  9 

(2.5)  ^(B;/,  •  •  •  ,  z«)(a>)  =  ^  ")  0  [2'‘(b-+ 1")  -  cu)] 

i=l  (.  fc=l 

We  can  now  define  the  family  of  integrals  that  we  shall  use  in  our  models. 
Definition  2.1.  The  process  f  has  a  belated  integral  with  respect  to 
{z^,  •  ■  •  ,  z^)  over  [a,  6]  if,  O  being  restricted  to  the  class  of  belated  partitions  of 
[a,  6],  there  is  an  r.  v.  J  such  that  S{U-,  f,  z^,  ■  ■  ■  ,  z‘^)  converges  in  probability  to  J 
as  mesh  11  tends  to  0.  Every  such  J  is  called  a  weak  version  of  the  integral,  and  is 
denoted  {possibly  ambiguously)  by 

(2.6)  {w)  r  f{t,a))dz^{t,a)),---,dz^{t,0L>). 

J  a 

Such  a  J  is  a  strong  version  of  the  integral,  and  is  denoted  by 

(2.7)  r  f{t,  co)  dz^{t,  oj),  •  •  •  ,  dz‘^{t,  co), 
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if  for  each  cOq  in  Q  such  that  the  limit  {with  notation  (2.2)) 

(2.8)  ^{(Oq)  =  lim  f{ti,(Oo)  H  [2:'‘(<i+ 1,  cOq)  -  z^{ti,  cOq)] 

meshn->0  k=l 

exists  it  is  true  that  J{cOq)  =  ^(cOq). 

As  usual,  we  omit  the  co  when  convenient.  It  is  quite  easy  to  show  that  if/ is 
integrable  with  respect  to  (z^  •  •  •  ,  z^),  a  strong  version  of  the  integral  exists. 


3.  The  stochastic  model 

If  the  sample  functions  of  /  are  bounded  and  those  of  the  z*  are  Lipschitzian, 
there  is  no  difficulty  in  proving  that  ifq  >  I,  then 

(3.1)  p  fit)  dzUt)  -  '  dz^t)  =  0. 

Suppose  then  that  the  functions  /*  and  gr*  and  the  derivatives  of  the  latter  with 
respect  to  the  x‘  are  continuous.  By  (3.1),  if  the  sample  functions  x‘  all  satisfy 
(1.4)  and  the  functions 

(3.2)  gl,Jx,t),  i  =  I,-  -  ,n;  p,(T  =  I,-  -  ,r;teT,xeR" 
are  continuous,  then  the  integrals 


(3.3)  s']  dzf^{s)dz%s) 

P,<r 

exist  and  are  zero  for  all  i  in  {1,  •  •  •  ,  n}  and  t  in  T.  Hence,  the  a?*  also  satisfy 

(3.4)  x\t)  =  x\a)  +  r  gr‘o[a;(5),  5]  ds  +  ^  f  9'p[a^(«),  s]  dz''(s) 

Ja  p 

+  i  Z  f  «]  dz^s), 

n  „  Ja 


i  =  ,  n,  a  ^  t  ^  b,  the  integrals  either  being  computed  for  each  sample 

curve  or  understood  as  strict  versions  of  belated  integrals.  No  matter  how  we 
choose  the  (continuous)  functions  (3.2),  we  obtain  the  consistency  property  (b). 

But  soon  we  shall  show  that  the  belated  integrals  can  be  defined  for  a  class  of 
processes  large  enough  to  possess  the  inclusiveness  property  (a).  When  this 
larger  class  of  z^  is  permitted,  the  integrals  in  (3.3)  no  longer  all  vanish,  and  the 
stability  property  (c)  does  not  hold  for  all  choices  of  functions  (3.2).  In  fact,  it 
is  far  from  clear  that  it  will  hold  for  any  such  functions.  We  make  the  choice 

n 

(3.5)  gp,„{oc,  0  =  i  Z  9p,xji^,  i)9i{x,  t),  i  =  1,  •  •  •  ,  n. 

j=i 

This  is  our  selection  principle.  We  do  not  consider  that  we  have  added  a 
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correction  term  (3.5)  to  the  “standard”  equation  (1.4).  Rather,  from  the  aggre¬ 
gate  of  all  equations  (3.4),  we  have  selected  the  one  specified  by  (3.5)  instead  of 
the  simplest  looking  one  with  all  functions  (3.2)  equal  to  0.  All  the  equations 

(3.4)  are  equally  in  accord  with  the  underlying  theory  that  gave  us  equations 

(1.4) ,  assuming  as  before  that  this  theory  has  been  established  only  for 
Lipschitzian  z'^.  But  setting  the  functions  (3.2)  equal  to  0  gives  us  merely  typo¬ 
graphical  simplicity,  while  (as  we  shall  ultimately  show)  the  choice  (3.5)  gives  us, 
at  least  under  some  restrictions,  the  much  more  important  virtue  of  stability. 

4.  Principal  existence  theorem  for  the  belated  integral 

Throughout  this  paper,  note  T  will  denote  a  set  of  real  numbers  and  [a,  6]  a 
closed  interval  contained  in  T.  Moreover,  the  symbol  e  T)  will  always  denote 
a  CT-subalgebra  of  s4.  and  we  shall  always  assume  if  t  and  a  are  in  T  and  cr  ^  t, 
then  F„  ^  F^. 

For  the  sake  of  brevity,  if  a:  is  a  proce.ss  defined  on  some  subset  of  T.  we 
shall  use  the  expression  “ar  is  F.  measurable”  to  mean  “for  each  t  in  Dy.,  {x{t,  ■)  is 
Ff  measurable.”  Furthermore,  to  avoid  complicated  typography  we  use  F{x)  to 
denote  F^  whenever  convenient ;  in  particular,  we  write  F{tj)  instead  of  writing  the 
tj  as  a  subscript  to  the  F. 

The  processes  {z^ .  •  •  •  ,  s'”)  that  play  the  principal  role  in  our  theory  are  those 
processes  on  [a,  6]  that  satisfy  the  following  conditions. 

Condition  4.1.  Each  zfip  =  I ,  •  •  •  .  r)  is  F.  measurable,  and  there  exist  posi¬ 
tive  numbers  K  and  S  and  a  positive  integer  q  such  that  if  p  6  {1,  •  •  •  ,  r}  and 
a  ^  s  ^  t  ^  b  and  t  —  s  <  S.  then  a.s. 

-  Z'(«)]p,)|  S  K(t  -  .?), 

-  2'(«)]"p,)  S  K{l  -  .1),  fc  =  1,  •  ■  ■  ,  g. 

If  X  is  a  vector  in  E",  we  define  |a:|  =  [2i(a:')^]*^^ ;  if  (a:(co);  co  €  Q)  is  an  n 
vector  valued  r.v.,  we  define  ||a:j|  =  E{\xY'y^^  whenever  this  expectation  exists. 

At  this  stage  we  observe  that  the  existence  of  the  integral  in  Definition  2.1 
can  be  proved,  for  q  =  I,  under  much  weaker  hypotheses  than  Condition  4.1; 
for  g  ^  2  considerable  weakening  is  also  possible,  though  not  as  much  as  for 
q  =  1.  But  the  gain  in  generality  is  bought  at  a  high  price  in  simplicity.  With 
Condition  4.1 ,  we  already  have  as  much  inclusiveness  as  was  asked  for  in  (a)  and 
Definition  2.1  is  only  slightly  more  complicated  than  the  standard  definition  of 
the  Riemann  integral.  Greater  inclusiveness  would  require  introducing  concepts 
and  methods  too  sophisticated  forborne  potential  users,  and  would  not  justify 
its  cost. 

The  next  lemma  is  an  essential  element  of  several  later  proofs.  Its  proof  differs 
only  trivially  from  that  of  Lemma  1  in  [5]. 

Lemma  4.1.  LetF^,  •  •  •  ,  F^  be  a -subalgebras  of  withF^  E  ^2  =  ‘  ‘  '  =  ^m- 

Let  Ml,  •  •  •  ,  and  A^,  •  •  •  ,  6e  r.v.  with  finite  second  moments  such  that  for 
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each  k  in  ‘ ,  m] ,  all  Uj  with  j  ^  k  and  all  Aj  withj  <  k  are  measurable. 
Let  Cj,  Djforj  =  I,  •  •  •  ,  m,  be  numbers  such  that  a.s. 


(4.2) 
Then 

(4.3) 


m 

Z 

j=i 


^  2  I  Cihi 

j=l 


+ 


Z 

j=i 


1/2 


It  is  convenient  to  state  a  frequently  used  corollary. 

Corollary  4.1.  Let  Condition  4.1  be  satisfied,  and  let  n  [with  notation  (2.2)) 
be  a  partition  of  \a,  6].  For  eachj  in  {1,  •  •  *  ,  /},  let  Uj  be  an  measurable  r.v. 
with  finite  second  moment.  Then 


(4.4) 


Z  n  -  Atj)'] 

j=l  k=l 


<  B 


e 

I 

1=1 


'{Ij+i  Ij) 


1/2 


where  B  =  2K{b  -  a)^>^  +  K^'^. 
Proof.  We  define 


(4.5) 


4 


Aj  =  n  -  Atj)i 

k=i 

Cj  =  Dj  =  K{tj+i  —  tj). 


The  hypotheses  of  Lemma  4.1  are  satisfied,  so 


(4.6) 


I  «A 
1  =  1 


^2KY,  {h||((,^,  -  -  0)} 

1=1 

e 


1/2 


+  i  Z  ll“if^(<i+i  -  h) 

.1=1 


1/2 


Applying  the  Cauchy-Buniakowsky-Schwarz  inequality  to  the  first  sum  in  the 
right  member  3delds  the  desired  conclusion. 

We  can  now  state  and  prove  an  existence  theorem  of  particular  importance  in 
the  rest  of  this  paper.  In  this  theorem  the  integrand  is  assumed  to  have  the 
following  rather  strong  continuity  property ; 

(d)/  is  bounded  in  L2  norm  on  T,  and  is  continuous  in  L2  norm  at  almost  all 
points  of  [a,  6]. 

For  such  integrands  we  have  the  following  theorem. 

Theorem  4.1.  Let  •  •  •  ,  2^  satisfy  Condition  4.1.  Let  (/(t):  r  €  T)  be  F. 
measurable  and  satisfy  (d).  Then  for  every  subinterval  [c,  e]  of  [a,  6],  /  has  a 
{belated  stochastic)  integral  over  [c,  e],  and  this  integral  has  an  Fg  measurable 
version.  Moreover,  for  every  8  >  0  there  is  a  b'  with  0  <  b'  <  b  such  that,  for  all 
subintervals  [c,  e]  of  [a,  6]  and  all  belated  partitions  of  [c,  e]  with  mesh  H  < 
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(4.7)  ||S(n;/,  z‘,  ■■■  .z")  -  jj(t)dz'W-dz‘'{t)\\  <  e. 

Proof.  Consider  the  case  in  which  /  is  continuous  in  L2  norm  on  T.  \i  W 
and  n"  are  belated  partitions 

n  =  (^1,  •  *  •  ,  1 ;  Tj,  •  •  •  ,  T^), 

n"  = 

of  [c,  e]  with  the  same  division  points, 

(4.9)  *^(n';/,  ,  2«)  -  S{U"-f,  z\--  -  , 

=  Z  [/(^j)  n 

J=1  k=l 

By  Corollary  4.1, 

(4.10)  ||S(n';/,z‘,  ■■■  ,2")  -  S(n";/,  2‘,  ,2«)|| 

g  B(e  -  c)''^  max  ||/(t;.)  -  }(z'l)\. 

j 

For  q  =  I,  the  restriction  to  11'  and  FI"  with  the  same  division  points  is  easily 
removed.  Given  two  partitions 

n'  =  {t\,  •  •  •  ,  /'y  +  i;  t'i,  •  •  •  ,  T'y), 

=  I',',  ■••.<’) 

of  [c,  e],  we  say  that  the  latter  is  obtained  from  the  former  by  adjunction  of 
division  points  if  each  t\  is  one  of  the  tj,  and  if  \tj,  tj+  j]  ^  /}+ 1]  then  xj  =  t-. 

In  this  case  it  is  easily  seen  that 

(4.12)  S(n';/,z')  =  S(n";/,z'). 

If  n'  and  n"  are  any  two  belated  partitions  of  [c,  e],  in  computing  their 
Riemann  sums  (for  g  =  1 )  there  is  thus  no  loss  of  generality  in  supposing  that 
n'  and  n"  have  the  same  division  points,  so 

(4.13)  ||S(n";/,z‘)  -  S(n";/,z‘)||  g  B{e  -  c)*'^  max  ||/(t;)  -/(t")||. 

i 

and  this  can  be  made  arbitrarily  small  by  restricting  O'  and  0"  to  have  small 
mesh.  So  the  Riemann  sums  converge  in  L2  norm  and  mesh  11  tends  to  0,  and 
Theorem  4.1  holds  if  g  =  1  and  /is  continuous  in  L2  norm.  The  latter  restriction 
can  be  removed  by  much  the  same  devices  as  in  the  case  of  the  ordinary  Riemann 
integral. 

If  g  >  1  and  O"  is  obtained  from  FI'  by  adjunction  of  division  points,  the 
analogue  of  (4.12)  fails.  For  example,  if  g  =  2  and  each  [fj,  /+  of  11'  contains 


STOCHASTIC  DIFFERENTIAL  EQUATIONS  271 

in  its  interior  either  a  single  division  point  of  IT"  (which  we  then  call  5,)  or  no 
such  point  (in  which  case  we  define  s,-  to  be  we  readily  calculate 

(4.14)  z\  z^)  -  S{n"-,f,  z\  z^) 

=  Z  /(^j){[2^(«j)  -  z\tj)']lz^{tj+i)  -  Z^iSj)] 
j=  1 

+ 

which  is  not  in  general  0.  However,  we  can  find  a  useful  estimate  of  its  norm. 
Define  )u(n')  =  maXj[<j+i  —  tj].  Since  we  have  assumed  Condition  4.1  holds, 
we  find 

(4.15)  \E{[_z\sj)  -  z\tj)'][z^{tj+i)  -  22(Sj.)]|i^(<j))| 

=  \E{z^{sj)  -  z\tj)E{\_z^{tj+i)  -  22(Sj.)]|i^(sj))|-P'(<j))| 
g  K{tj+i  -  Sj)E(\z\sj)  -  z\tj)\\F{tj)) 

^  -  Sj)lE([_zHsj)  -  2'(<,)]"|J^(fj))]‘'" 

-  tj). 

The  same  estimate  holds  with  z^  and  z^  interchanged.  Similarly, 

(4.16)  £([2‘(«j)  -  |J?((j)) 

g  A;Vn')((;+,  -  (j). 

and  likewise  with  z^  and  z^  interchanged.  We  now  apply  Lemma  4.1  to  each  of 
the  two  sums  in  (4.14);  by  (4.15)  and  (4.16),  we  obtain 

(4.17)  ||S(n';/,  z‘,  z^)  -  S(n";/,  z',  z^)||  g  C[;<(n')]‘« 

where 

(4.18)  C  =  sup  {||/(t)||:  t€  T}[2K^'^{b  —  a)  +  {b  —  a)^^^]. 

If  g  >  2,  the  estimate  (4.17)  (with  a  different  c)  is  still  valid,  the  proof  is  not 
essentially  different  but  the  details  are  more  tedious. 

We  shall  repeatedly  use  the  following  procedure. 

Procedure  4.1.  Owen  a  partition  11  =  (<i,  • '  •  ,  ;  tj,  •  •  •  ,  Xg),  we  adjoin 

to  W  a^  new  division  points  the  midpoints  of  all  those  intervals  [tj,  <j+i]  such  that 

lj+\  ~  Ij  ^  2/^(  H)- 

We  form  a  sequence  of  partitions 

(4.19)  n'o  =  n',  n;,  n'2,  •,  n;, 

each  formed  from  the  preceding  by  applying  Procedure  4.1.  We  carry  it  to  a 
large  enough  a  so  that  no  interval  of  the  original  O'  remains  unsubdivided;  then 
each  interval  of  11^  will  have  length  at  least  Since 

p{U[)  =  2-^(0'), 


(4.20) 
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we  may  also  suppose  that  is  less  than  half  the  length  of  the  smallest 

interval  in  fl".  Next,  starting  with  0",  we  form  the  sequence 

(4.21)  n;;  =  n",  n;',  -  - 

by  repeated  application  of  Procedure  4.1.  We  can  and  do  choose  P  so  that 

(4.22)  s  s  2‘'"Mn;): 

this  is  possible  by  the  analogue  of  (4.20).  By  (4.17)  and  (4.20), 

(4.23)  ||.S(n;;/,2‘,  2^)  -  S(n';/,2‘,  2^)1 

g  Z‘  cKn;)]-'^ 

M  =  0 

^  (2  + 

Similarly, 

(4.24)  ||S(n;;/,2‘,2^)  -  S(ll";/,2>,  2")|| 

^  (2  + 

Every  interval  in  0^  has  length  at  least  ^/illlj),  and  likewise  for  W'p.  So  every 
interval  in  fl^  has  length  at  least  /i(n^)/2^'^^.  and  vice  versa.  Thus,  each  interval 
of  contains  at  most  three  division  points  of  fl^.  We  can  adjoin  these  to  0^  in 
two  stages,  obtaining  a  partition  11'^  such  that  (by  (4.17)) 

(4.25)  ||S(n;,/,2',2^)  -  S(n;;/,2‘,  2^)11 

^  2C\_n{n')Y'^. 

Similarly,  we  can  adjoin  the  division  points  of  fl"  to  11^  in  at  most  two  stages, 
obtaining  a  partition  O"  such  that 

(4.26)  \\S{n''-,f,z\z^)  -  S{n;-,iz\z^)\\ 

^  2C'[/i(n")]^/l 

Now  and  0"  have  the  same  division  points.  By  (4.10),  (4.19),  (4.20),  (4.21), 
and  (4.22),  for  L2  continuous/,  the  Riemann  sums  for  O',  for  for  O"  and 
for  n"  have  differences  (in  the  order  named)  that  have  L2  norms  which  are 
arbitrarily  small  if  mesh  11'  and  mesh  fl"  are  small.  This  implies  that 
aS(  n  ;  /,  2^  •  •  •  ,  z^)  converges  in  L2  norm  as  mesh  11  tends  to  0,  and  the  integral 
exists.  The  uniform  closeness  of  Riemann  sum  to  integral  follows  from  the 
fact  that  all  estimates  of  L2  norms  were  uniformly  valid  ;  and  we  could  have 
used  Fg  everywhere  in  place  of  without  changing  anything,  which  would  give 
us  an  F^  measurable  integral. 
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5.  An  estimate,  and  a  second  existence  theorem 

Suppose  that  the  hypotheses  of  Theorem  4.1  are  satisfied,  and  that  II  (with 
notation  (2.2))  is  a  belated  partition  of  a  subinterval  [c,  e]  of  [a,  6].  By 
Corollary  4.1, 

(5.1)  ||S(n;/,z‘,  ■■■,z«)||  g  -  O)!'” 

But  by  the  special  case  of  Theorem  4.1  in  which  12  contains  a  single  point  and 
q  =  \  and  z{t)  =  t,  the  belated  integral  of  ||/||  with  respect  to  t  exists.  Since 
Cauchy  partitions  are  both  Riemann  partitions  and  belated  partitions,  the 
belated  integral  of  ||/||  is  its  Riemann  integral,  and  by  letting  mesh  fl  tend  to  0 
we  obtain  from  (5.1) 

')\I2 

If  T  is  an  interval,  and  /  is  F.  measurable  and  (/(t,  co)  :  co  e  2,  O)  e  12)  is  dP 
measurable  on  T  x  12,  and 

(5.3)  J  £'[/(T)^]rfT  <  00. 

it  is  possible  to  find  (as  in  [1],  p.  440)  a  sequence  of  bounded  processes 
/ii/ai  ■  '  ■  1  satisfying  the  hypotheses  of  Theorem  4.1  such  that 

(5.4)  lim  f  £J(l/„  -f\^)dx  =  0; 

n-*  cc  J  a 

in  fact,  by  a  slight  modification  of  the  construction  in  [1]  we  may  choose  /„ 
that  are  continuous  in  L2  norm.  Then  by  (5.2)  the  integrals 

(5.5)  J  f„{t)  dz^T)  ■  ■  ■  dz‘^{T),  w  =  l,2,  3,  •••, 

form  a  Cauchy  sequence  in  7^2(12,  P),  and  hence  have  a  limit  in  that  space.  We 
can  accept  this  limit  as  the  definition  of  the  integral  of  /  with  respect  to 
(2^  •  •  •  ,  z^),  thus  extending  the  class  of  integrable  functions  so  as  to  have  the 
same  sort  of  closure  properties  as  the  Ito  integral.  Such  properties  are  valuable 
in  many  investigations.  But  in  this  paper  we  have  no  need  of  them,  so  we  pursue 
this  no  farther. 

In  Definition  2.1,  we  used  the  concept  of  convergence  in  probability.  In 
Theorem  4.1,  we  obtained  more:  the  Riemann  sums  converged  in  L2  norm. 
There  is  an  intermediate  kind  of  convergence  that  is  sometimes  encountered, 
that  we  shall  call  uniform  convergence  in  near  L2  norm.  We  define  it  in  the  setting 
of  functions  of  partitions,  although  it  evidently  can  be  applied  to  more  general 
limit  processes.  (In  the  definition,  1^  denotes  the  indicator  function  of  the  set  A.) 


(5.2) 


/(/)  dz^{t)  •  •  •  dz‘^{t) 


<  B 
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Definition  5.1.  Assume  that  to  each  subinterval  [c,  e]  of  [a,  6]  and  to  each 
belated  partition  11  of  [c,  e]  there  corresponds  an  r.v.  a:(n,  [c,  e])  and  an  r.v. 
Xq  ([c,  c]).  Then  a:(n,  [c,  e])  converges  to  Xo([c,  e])  in  near  Lj  norm,  uniformly 
on  subintervals  [c,  e]  of  [a,  6],  if  to  each  positive  &  there  corresponds  a  positive  d 
and  a  subset  A  of  Q.  with  P{A)  >1—8  such  that  for  all  [c,  e]  ^  [a,  5]  and  all 
n  with  mesh  11  <  <5, 

(5.6)  ||l^Wn,  [c,  e])  -  Xo{\_c,  e])]||  <  e. 

Clearly  this  implies  uniform  convergence  in  the  metric  of  convergence  in  prob¬ 
ability,  and  is  implied  by  uniform  convergence  in  L2  norm. 

Many  processes  /  possess  the  following  important  property : 

(e)  f  is  separable,  and  with  probability  1  the  sample  function  [/(t,  co):  t  £  T] 
is  bounded.  (Note  that  the  bound  is  not  assumed  to  be  independent  of  co.) 

For  integrands  /  with  this  property,  we  can  prove  the  following  theorem. 
Theorem  5.1.  Let  Condition  4.1  be  satisfied.  Let  [/(t)  .  x  e  T~\  satisfy  (e),  and 
be  F,  measurable,  and  be  continuous  in  probability  at  almost  all  points  of  [a,  6]. 
Then  for  every  subinterval  [c,  e]  of  [a,  6],/  has  a  belated  integral  with  respect  to 
(z^  •  •  •  ,  z^)  over  [c,  e],  and  this  integral  has  an  measurable  version.  Moreover, 
the  Riemann  sums  S{]\  ,  f,  ,  ■  ■  ■  ,  z^)  corresponding  to  belated  partitions  11  of 
[c,  e]  converge  to  the  integral  over  [c,  e]  uniformly  in  near  Lj  norm  as  mesh 

n  0. 

Proof.  Let  8  g  T  be  a  separate  set  for/.  There  is  a  subset  A  of  Q  with  PA  =  0 
such  that  for  every  open  interval  I  and  every  co  in  D  —  A,  the  functions 
[/(t,  m) :  t  €  /  n  T]  and  [/(t,  cu)  :  t  6  /  n  /S]  have  equal  suprema  and  equal  infima. 
Let  £  be  positive.  For  each  positive  N  we  define  A^ir)  {x  e  T)  to  be  the  set  of  all 
O)  in  Q  such  that  \f{s,  co)|  ^  N  for  s  =  x  and  for  all  s  <  x  in  S.  This  is  measur¬ 
able,  and  F[.4,v(t)]  is  nonincreasing.  By  (e),  we  can  choose  N  large  enough  so  that 

(5.7)  PlAsib)']  >  1  - 

Let  (t>f^{x,  • )  be  the  indicator  function  of  ^iv(T).  Then  by  definition  of  A^, 
is  bounded.  It  is  also  fairly  obviously  F^  measurable  at  each  x  in  T,  and  it  is 
continuous  in  probability  (hence,  being  bounded,  it  is  continuous  in  L2  norm) 
except  on  the  union  of  the  null  set  of  discontinuities  of  /  and  the  countable  set 
of  discontinuities  of  So,  by  Theorem  4.1  for  every  [c,  e]  ^  [a,  6]  the 

Riemann  sums 

(5.8)  S(n;f(pj^,z\---,z‘’) 

converge  as  mesh  11  0  to  the  integral  of  f(p;^  over  [c,  e],  uniformly  with 

respect  to  [c,  e].  But  the  sums  (5.8)  coincide  on  A^(b)  —  A  with 

(5.9)  8(n;f,z\---,z^). 

From  this  and  (5.7),  it  follows  readily  that  the  sums  (5.9)  converge  in  near  L2 
norm  to  a  limit,  which  is  by  definition  the  integral  of f  over  [c,  e]  ;  and  the  con¬ 
vergence  is  uniform  with  respect  to  [c,  e]. 
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6.  Examples 

Suppose  first  that  and  are  both  the  same  Wiener  process  w.  Then  if 
a  ^  s  ^  t  ^  b,w{t)  —  w{s)  is  independent  of  F^.  Let  O  (with  the  usual  notation 

(2.2))  be  a  belated  partition,  and  define 

(6.1)  Aj  =  [»((,.+  ,)  -  -  (Ij+i  -  (j),  j  =  1,  ■  •  •  , 

ThenE(Aj|J?((,.))  =  0,E{Af\F{lj))  =  2(/j_,  -  tjf. 

If  /  satisfies  the  hypotheses  of  Theorem  4.1  by  Lemma  4.1, 

(6.2)  i  f(tj)Aj  g  £  2||/((,)f(lj,.  -  /,)4"', 

j=i  (j-i 

which  tends  to  0  with  mesh  fl.  By  Theorem  4.1,  S{U:f,  z^,  z^)  has  a  limit  as 
mesh  n  ->  0;  by  (6.1)  and  (6.2),  AS'(n  ;/,  t)  has  the  same  limit.  So  if/satisfies  the 
hypotheses  of  Theorem  4.1,  , we  have 

(6.3)  r  fwidwf  =  c  fi,i)dt. 

Ja  Ja 

This  also  holds  if/ satisfies  the  hypotheses  of  Theorem  5.1. 

The  next  lemma  is  useful  because  it  often  permits  us  to  discard  integrals  with 
several  dzf^.  It  applies  to  disturbances  that  satisfy  the  following  condition. 

Condition  6.1.  To  each  positive  e  there  corresponds  a  positive  6  and  a  set 
A  ^  Q.  with  P{A)  >1—2  such  that  if  a^s^t^b  and  t  —  s  <  d, 

(6.4)  \^{t,  oj)  —  zf*{s,  a))|  <  e{t  —  s)^'^ 
for  all  (0  in  A. 

For  example,  by  a  well-known  theorem  of  Kolmogorov  (see  Neveu  [7],  p.  97) 
z^  satisfies  Condition  6.1  if  there  is  a  constant  K  such  that 

(6.5)  E(lzr{t)  -  z^(«)]*)  ^  K{t  -  ,s)^  a  ^  s  ^  t  ^  b. 

Theorem  6.1.  Let  the  hypotheses  of  Theorem  4.1  or  Theorem  5.1  be  satisfied. 
If  q  ^  3,  and  z^,  ■  •  ■  ,  z‘^,  satisfy  Lemma  6.1,  then 

(6.6)  ^"f{t)dz\t)---dz\t)={). 

Proof.  Suppose  first  that  |/(t,  <u)|  has  an  upper  bound  on  T  x  ff.  Let  e  be 
positive,  and  let  <5  and.^  serve  for  all  z*  in  Condition  6.1 .  If  11  (with  notation  (2.2)) 
is  a  belated  partition  with  mesh  11  <  min  (1,  5),  for  all  ca  in  ^  we  have 

(6.7)  X  /(L’  n  (2=^0+!’  ")  “  "))  = 

j=l  k=l 

So  the  Riemann  sums  converge  in  near  L2  norm  to  0,  and  the  integral  is  0. 
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If/  satisfies  the  hypotheses  of  Theorem  4.1,  for  each  positive  N,  we  define 

r  /(t,  co)  if  —N  ^  /(t,  co)  ^  N, 

(6.8)  /jv(T,m)=jA^  if  /(r,  co)  >  N, 

I  —N  if  /(t,  co)  <  —N. 

By  the  proof  just  completed,  the  integral  is  0  for  all  N.  So  by  (5.2), 

1/2 


(6.9) 


f{t)  dz^{t)  ■  •  •  dz‘^{t) 


<  B 


11/(0  -  /n(0|1^  dt 


The  right  member  tends  to  0  as  oo,  so  the  left  member  is  0. 

If  the  hypotheses  of  Theorem  5.1  are  satisfied,  with  the  notation  of  that 
theorem,  /(/);y  has  integral  0  for  all  N,  so  the  integral  of  /  is  0. 

There  are  other  useful  sets  of  conditions  that  eliminate  integrals,  but  we  will 
confine  ourselves  to  two  simple  cases. 

Theorem  6.2.  If  the  hypotheses  of  Theorem  4.1  or  of  Theorem  5.1  hold  with 
q  ^  2,  and  z^{t)  =  t,  then  |a/(0  dz^  ■  •  •  dz‘^  =  0. 

Proof.  With  n  as  in  (2.2),  define 


(6.10) 

Then 

(6.11) 

(6.12) 


k=  1 


Ei  n 


—  (ij+i  ij) 


\fc  =  i 


E 


n  ^5 

fc= 1 


‘g-  1 


n  a; 


l^r, 


Ifg-  =  2,  the  right  members  of  (6.1 1 )  and  (6.12)  do  not  exceed /l(/+ 1  —  /)^  and 
K(tj+i  —  tj)^,  respectively,  by  Condition  4.1;  so,  under  the  hypotheses  of 
Theorem  4.1,  Lemma  4.1  assures  us  that  |l<S(n  ;/,  •  •  •  ,  z^)\\  tends  to  0  with 

mesh  n.  If  the  hypotheses  of  Theorem  5.1  hold,  the  conclusion  is  established  by 
the  use  of  the  functions  ((>^  of  Theorem  5.1. 

If  g  >  2,  let  r  and  s  be  integers  at  most  ^g  with  r  +  s  =  q  —  1.  Then 


(6.13) 


/g-l 

£  n 

\fc  =  i 


a; 


F.,  i 


E 


n 


1/2 


E 


n 


k  =  r+l 


1/2 


Since  ^  and  2''  ^  g,  by  Condition  4.1,  the  first  factor  in  the 

right  member  of  (6.13)  does  not  exceed  a  constant  multiple  of  {tj+i  — 

The  same  is  true  of  the  second  factor,  so  the  left  member  of  (6.11)  does  not 
exceed  a  multiple  of  (/+i  —  tfj^.  Likewise  the  left  member  of  (6.12)  does  not 
exceed  a  multiple  of  (/-i-i  —  tf)^. 

The  rest  of  the  proof  is  as  for  q  =  2. 
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Theorem  6.3.  Let  and  be  processes  such  that  if  a  ^  s  ^  t  ^  b,  then 
z^{t)  —  z^  (s)  and  z^ (t)  —  z^  {s)  are  conditionally  independent  as  conditioned  by  F^. 
Let  the  hypotheses  of  Theorem  4.1  hold.  Then 

(6.14)  r  f{t)dz^{t)dz^{t)  —  0. 

J  a 

Proof.  We  use  the  same  notation  as  in  the  preceding  proof.  Then 

(6.15)  =  |£!(A,  z‘  p(y)£:(AjZ^p((j))| 

i  -  tjf. 

(6.16)  E{\_^,z^^JZ^y\F^tJ))  =  £’([V>]^|f(lj))£([A,z^]^p(y) 

g  -  tj)\ 

By  Lemma  4.1 ,  \\S{  11 ; /,  z^,  z^)||  tends  to  0  with  mesh  11 . 

7.  Existence  theorem  for  a  functional  equation 

If  the  hypotheses  of  Theorem  4.1  are  satisfied  and  we  define  a  process  F  on 
[a,  6]  by  setting 

(7.1)  Lit)  =  [  f{s)  dz\s)  ■  ■  '  dz‘^{s),  /  €  [a,  6], 

Ja 

we  know  by  Theorem  4.1  that  F{t)  has  finite  second  moment  and  can  be  chosen 
F,  measurable.  By  (5.2),  we  know  that  it  satisfies  a  Holder  condition  of  exponent 
1/2  in  L2  norm.  Processes  with  these  properties  occur  often  enough  in  succeeding 
pages  to  justify  giving  them  a  name. 

Definition  7.1.  Let  H ^i2{T,  F,)  be  the  class  of  all  {real  or  vector  valued) 
processes  x  onT  such  that  for  all  t  in  T,  x{t)  is  F,  measurable  and  £'(|:r(0|^)  <  00, 
and  there  is  a  number  H*  such  that  if  s  and  t  are  in  T, 

(7.2)  1|3:(0  —  •a^(«)||  ^  H*(t  —  s)^^^. 

Corollary  7.1.  If  the  hypotheses  of  Theorem  4.1  are  satisfied  and  F  is 
defined  by  (7.1),  F  belongs  to  H^i2{{a,  6],  F.). 

Instead  of  restricting  ourselves  to  stochastic  “differential  equations”  such  as 

(3.3) ,  we  shall  discuss  a  class  of  functional  equations 

(7.3)  a;'(T)  =  y'{t),  t  e  T.  t  a, 

(7.4)  x'{t)  =  y\t)  +  f  g^Q(s,x{s))ds 

J  a 

+  Z  f  X{s))dz‘'{s)  dz^’is)  •  •  •  dz'^is). 


i  —  n  \  a  t  b, 
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where  the  letters  denote  members  of  {!,•••  ,  r},  and  ^  is  a  finite  set  of  finite 
ordered  sequences  {p,  a,  •  •  •  ,  </>)  of  members  of  {1,  •  •  •  ,  r}.  The  functions 
g'o,  9p,a,  -,<t>  ''"ill  t)e  called  coefficients.  We  shall  make  the  following  assumptions. 

Assumption  7.1 .  The  class  ^  is  a  linear  class  of  n-vector  valued  processes  on 
T  that  contains  F,),  and  is  closed  under  uniform  convergence  in  Lj  norm. 

Assumption  7.2.  Each  coefficient  g  is  defined  on  T  y.  and  for  fixed  x  in 
g{  - ,  x)  is  bounded  in  L2  norm  on  T  and  is  continuous  in  L2  norm  at  almost  all 
points  of  [a,  6]. 

Assumption  7.3.  If  F  is  a  a-subalgebra  of  js/,  and  t  g  T,  and  x  is  a  process  in 
0*  such  thatx{x)  is  F  measurable  for  all  t  ^  tin  T,  then  g{t,  x)  is  alsoF  measurable. 

For  Theorem  7.1  it  would  be  adequate  to  choose  //1/2  F.)  for  However, 

in  the  case  of  differential  equations  a  little  more  latitude  is  convenient.  Suppose 
that  (t'o  and  Gp  ,j  ...  ,p  are  functions  on  T'  x  R”  such  that,  for  a  certain  subset  Nq 
of  T  with  Lebesgue  measure  0  and  a  certain  positive  L.  it  is  true  that  Gq  and 
Gp  „  ...  ^  are  continuous  in  all  variables  at  all  points  {t,  x)  with  t  e  T  —  Nq  and 
X  e  R",  and  for  all  t  in  T  and  x^,  X2  in  R” 

(7.5)  |6ro(/,a;i)  —  GQ{t,X2)\  ^  L\x^  —  X2\. 

and  likewise  for  the  Gp^^...^^.  Then  for  all  processes  a:  with  finite  second  moments 
we  can  define 

(7.6)  g^Q{t,x)  =  G\)[t,x[t)), 

(7.7)  =  Gl^„^...^^(t,x{t)),  teT, 


and  Assumption  7.3  is  satisfied.  To  attain  Assumptions  7.1  and  7.2  also,  we  can 
make  the  following  assumption. 

Assumption  7.4.  0^  is  the  class  of  all  processes  bounded  in  L2  norm  on  T 

and  continuous  in  L2  norm  at  almost  all  points  of  \a,  6]. 

We  can  simplify  notation  a  little  by  defining  z^{t)  =  t  for  all  real  t,  and  ad¬ 
joining  the  one  element  sequence  (0)  to  With  this  understanding,  equations 
(7.3)  and  (7.4)  take  the  notationally  simpler  form 

(7.8)  ^'(t)  =  y'iT),  T  e  T,  T  ^  a, 

(7.9)  x‘{t)  =  yft)  +  X  f  9‘p,<t,-  -,4>{^^  ^(«))  '  '  dz^^^is), 

%  aa 

t  =  \,---,n\a'^t“^b. 

Theorem  7.1.  Let  the  coefficients  in  (7.8)  and  (7.9)  satisfy  Assumptions  7.2 
and  7.3,  and  let  the  satisfy  Condition  4.1 .  Assume  also  that  there  exists  a  positive 
L  such  that  if  x^  and  X2  are  in  0^  and  t  g  [a,  5],  for  each  coefficient  g  in  (7.8)  and 

(7.9)  it  is  true  that 

(7.10)  \\g{t,  xC  -  g{t,  a:2)||  ^  L  sup  {||xi(s)  -  a:2(s)||;  «  e  T.  s  ^  t}. 
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Let  y  belong  to  ^  and  be  F,  measurable.  Then  there  is  an  F,  measurable  process 
x{-)  in  ^  such  that  x'{x)  =  y'{x),  x  e  T,  x  ^  a,  and  (7.8)  holds  for  a  t  ^  b. 
Ify(-)eH,j2(T,F,f  so  does  x{-).  Moreover,  if  x^  is  any  F.  measurable  process 
satisfying  (7.8)  and  (7.9)  then  P[xj(/)  =  x(/)]  =  1  for  all  t  in  T. 

Proof.  Hypothesis  (7.10)  guarantees  that  the  coefficients  are  nonanti- 
cipative;  if  x^  and  X2  belong  to  ^  and  Xi{x)  —  ^2(1)  if  t  €  T'  and  x  ^  t,  then 
g{t,  x^)  =  g{t,  X2)-  If  a:  is  defined  only  on  the  part  of  J'  in  (  — cc,  and  has  an 
extension  xioT  that  belongs  to  by  (7.10)  all  such  extensions  x  give  the  same 
value  to  g{t,  x).  To  simplify  notation,  we  shall  define  g(t,  x)  to  mean  that 
common  value. 

We  use  Picard’s  method.  We  define  iCo  =  y,  and  then  successively 

(7.11)  4  +  i(t)  =  y'(T).  XET,x^a, 


(7.12) 


4+i(t)  =  y\^) 


■  ^k)  dzF  dz^  •  •  •  dz’^. 


By  hypothesis  Xq  is  in  If  we  assume  in  that  class,  the  integrands  in  (7.11) 
and  (7.12)  satisfy  the  hypotheses  of  Theorem  4.1  by  Corollary  7.1,  x[  +  i  belongs 
to  Hii2{T,  F.  ).  Thus,  (7.11)  and  (7.12)  define  x^  for  A:  =  0,  1,  2,  3.  •  •  •  .  Define, 
for  every  process  x  on  T  and  every  t  in  T, 


(7.13)  N(t,  x)  =  sup  {||a:(T)||  |  t  e  7",  t  ^ 

If  gp^a,  -,4>  of  coefficients  in  (7.8)  and  (7.9)  and  A:  ^  1,  by  (5.2)  and 

hypothesis  (7.10), 


(7.14) 


^  ^11  |lg'i,^,....,^(s,xj  -  gi,,„,...,4,{s,x^_^)f  ds 

(  f'  1  1/2 

^  L^N{s,  x^  —  x^_  ds 


1/2 


Let  Bq  be  the  product  of  n,  B,  and  the  number  of  sequences  in  the  set  By 
(7.11),  (7.12),  and  (7.14), 

1/2 

(7.15)  ll^fc+i(0  -  ^  ^ojj  Nis,x^  -  x^_,)ds^^ 

Since  this  estimate  is  still  valid  if  we  replace  t  in  the  left  member  by  any  smaller 
member  of  T. 


(7.16) 


N{t,  a:fc+i  -  x^)^  ^  Bl 


N(s,  x^  —  1)  ds>. 
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We  can  now  prove  by  induction  (with  j  =  0) 


(7.17)  N(t.Xk  -  ^  {sup  ||y(/)P} 


jt  -  a) 
kl 


k  =  0,1,  2, 


For  k  —  0.  this  is  simply  the  statement  A^(f,  y)^  ^  sup  ||y(/)p.  If  (7.17)  holds 
forfc,  by  (7.16), 


(7.18)  +  ^  -  Xkf  S  I  (sup||yP) 

>2fc  +  2 


k\ 


f-s’  —  at  ds 


=  sup 


L>Zh 

^0 


it  -  at 


(k  +  1)!_ 

so  (7.17)  holds  for  all  nonnegative  integers  k.  It  follows  at  once  that  the  sums 


(7.19) 


=  Z  ~  ^k-i 


k  =  0 


converge  uniformly  in  L2  norm  to  a  limit,  which  we  call  x.  This  limit  belongs  to 
fJ^liiT,  F.),  and  by  (7.11)  and  (7.12)  it  satisfies  (7.8)  and  (7.9). 

If  x'  and  x"  both  satisfy  (7.8)  and  (7.9)  and  are  F.  measurable,  just  as  we 
proved  (7.16)  we  can  prove 

1/2 

(7.20) 


N(t,  x'  —  x")^  ^  Z?o<  I  N(s.  x'  —  x”t  ds 


The  only  solution  of  this  is  N(8,  x'  —  x")  =  0,  which  completes  the  proof. 


8.  Cauchy-Maruyama  approximations 

G.  Maruyama  [4]  has  extended  the  well-known  Cauchy  (or  Euler)  method 
of  constructing  polygonal  approximate  solutions,  proceeding  successively  from 
each  vertex  to  the  next,  to  the  stochastic  differential  equations  (1.3).  It  is  easy 
to  extend  this  procedure  still  further  to  equations  of  the  form  of  (7.8)  and  (7.9). 
Given  any  Cauchy  partition 

(8.1)  W  =  Hi,  ^  ,  tf) 

of  [a,  6],  we  first  define  :r(T)  =  y{x)  for  all  t  in  T  with  t  ^  a.  Then  x  having 
been  defined  for  all  t  in  T  with  t  ^  tj,  we  define  it  on  {tj,  ^j+i]  by  setting 

(8.2)  x\x)  =  x\tj)  +  y^t)  -  y\tj) 

+  •  •  •  [2*^(0  - 

(Notice  that  the  coefficients  are  defined,  even  though  x  has  been  defined  only 
up  to  tj ,  by  the  first  paragraph  of  the  proof  of  Theorem  7.1.) 

We  can  prove  that  under  the  hypotheses  of  Theorem  7.1  these  Cauchy- 
Maruyama  functions  x  converge  to  the  solution  x  of  (7.8)  and  (7.9)  uniformly 
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in  L2  norm  as  mesh  0  — >  0.  But  in  Sections  9  and  11,  we  shall  need  a  different 
approximation,  in  which  we  shall  permit  a  small  departure  from  equality  in  (8.2). 

We  shall  suppose  that  ^  is  defined  by  Assumption  7.4,  and  we  shall  adopt 
the  abbreviations 

(8.3)  -  tj,  Ajy  =  y{tj+i)  -  y{tj),  AjZ^  =  2:^(^j  +  i)  -  ^'’(^j)- 

Suppose  now  that  to  each  Cauchy  partition  0  (with  notation  (8.1))  thei 
corresponds  a  process  x  with  the  following  properties : 

(f)  x{r)  =  y{t),  X  eT,t  ^  a; 

(g)  to  each  positive  e  there  corresponds  a  positive  d  such  that,  if  mesh  0  <  (5 

(8.4)  ||:r(0  -  x{tj)  -  y(t)  +  y(tj) 

- 

^  e(1  +  sup  {||:r(T)|l:  x  e  T,  x  ^  t}){tj  ^  t  <  tj+y), 

(8.5)  ||^(^j+i)  -  x{tj)  -  Ajy  -  S  gp,„,...,^(tj.  x)AjZ^  ■  •  •  Ajzf'W 

^  e(1  +  sup  {l|:r(T)||:  x  e  T,  x  ^  tj  +  i})Ajt: 

(h)  if  X  e  T  and  x  S  x{x)  is  measurable. 

(The  Cauchy-Maruyama  functions  (8.3)  clearly  satisfy  these  requirements.) 
We  can  then  prove  the  following  theorem. 

Theorem  8.1.  Let  the  hypotheses  of  Theorem  7.1  hold.  Assume  that  to  each 
Cauchy  partition  11  of  [a,  6]  there  corresponds  a  process  x  in  &  such  that  (f),  (g), 
and  (h)  hold.  Then  as  mesh  11  tends  to  zero,  x  converges  in  L 2  norm,  uniformly  on 
T,  to  the  solution  x  of  (7.8)  and  (7.9). 

Proof.  By  Theorem  4.1,  the  solution  x  of  (7.8)  and  (7.9)  exists,  and  we  can 
and  do  choose  it  to  be  F,  measurable.  Let  11  be  a  Cauchy  partition,  with  notation 
(2.1);  let  t  be  a  point  of  [a,  6];  and  define 

(8.6)  /fc  =  largest  number  in  set  {ti,  •  •  '  .  t^}  {  —  cc .  f\. 

(8.7)  N{t)  =  sup  {||i'(T)  —  a:(T)|[ :  x  €  T,  x  ^  4}. 

k- 1 

(8.8)  xft)  =  y\t)  +  Yj 

j=i 

+  •••  [2*^(0  - 
(For  X  e  t,  X  ^  a  we  take  X(t)  =  ^(t).) 

Let  M  —  1  be  an  upper  bound  for  |la:(T)|j  on  T,  and  let  e  be  any  number  such 
that  0  <  2(1  +  6  —  a)£  <  1.  By  Theorem  4.1,  there  is  a  positive  such  that 
if  mesh  11  <  (5i , 


(8.9) 


||X(t)  -  ar(T)||  <  e(T  6  T). 
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With  d  as  in  (g),  we  let  0  be  any  belated  partition  such  that 

(8.10)  mesh  0  <  min 

Then,  with  defined  by  (8.6), 

(8.11)  x\t)  —  x\t) 

=  X\t)  —x\t) 

k-  1 

+  I  -  XjV'  - 

j=  1 

+  x\t)  -  x\t^)  -  [y'{t)  -  y'(^fc)] 

-  Z  ^)[2^(0  -  ^(tk)']  •  •  •  -  2^4)] 

k-  1 

+  Z  Z  K,  -  9p,  -,4>ih^  x)']Xjzf’  •  •  • 

j=  1  « 

+  Z  l9‘p,  -,4,ihr  -  9i,  -,<t>(h,  ^)]  l^(t)  -  ^{ik)]  ■  •  • 

[z*(t)  - 

By  hypothesis  (7.10)  of  Theorem  7.1  with  (5.1), 


(8.12) 


k-  1 


Z  l9p,-  -,<t>{tj^  -  9i>,-  -,<t>itj^  x)']Ajzf’  ■  •  •  AjZ^ 


j=i 


+  [9p,  -,<i,(k^^)  -  9p,-,4^k,x)']lz^{t)  -  z^(/fc)]  •  •  •  [2*^(0  -  2'^(<fc)] 
rk-i  ')l/2 

i  b|  X  L^{N{t|)Y^Jl  +  L^N(l,)Y[l  -  (J 
=  Bl\ f  N{sY  ds]  ' 


Since  ilf  —  1  is  an  upper  bound  for  ||x|] ,  by  (8.7),  ||:f(<)||  ^  M  —  I  +  iV(f),  and 
the  right  members  of  (8.4)  and  (8.5)  are  at  most  e{M  +  N{t)),  e{M  +  N{tj+i)), 
respectively.  So  if  C  is  the  number  of  members  of  the  set  9^,  from  (g),  (8.9), 
(8.10),  (8.11),  and  (8.12),  we  deduce 


(8.13)  \\x(t)  -  ^(1)11 

^  e  +  &{M  +  —  a)  +  e{M  +  iV(l)}  +  CBlI  j  N{sYds 


1/2 


The  right  member  is  a  nondecreasing  function  of  t,  so  (8.13)  remains  valid  if 
we  replace  t,,  in  the  right  member  by  t  and  then  replace  t  by  any  larger  number, 


STOCHASTIC  DIFFERENTIAL  EQUATIONS 


283 


or  equivalently  replace  t  in  the  left  member  by  any  smaller  number.  So 
(8.14)  N{t)  ^  e[l  +  M{1  +  6  -  a)]  +  e(l  +  6  -  a)N{t) 


4r 


N{s)  ds 


1/2 


By  the  fact  we  have  chosen  e  such  that  0  <  2(1  +  6  —  a)e  <  1,  this  implies 

(8.15)  N{t)  ^  2e[l  +  M{1  +  b  -  a)]  +  2C5l||  N{sf  ' 

To  condense  notation,  we  write 

P  =  2[1  +  M{\  +  6  -  a)], 

Q  =  2CBL. 

Then  from  (8.15)  we  can  deduce  that 

(8.17)  N{t)  ^  2eP  exp  {{2P^Q\t  —  a])(a  ^  t  ^  6)}, 

for  (8.17)  holds  at  f  =  a.  If  it  does  not  hold  ever5rwhere  in  [a,  6],  there  is  a  first 
point  Iq  at  which  it  fails.  Then  it  holds  on  [a,  <o],  so  by  (8.15), 


(8.18) 


iV(<o)  ^  eP  +  4e^P^  exp  {4P^Q^[e  —  a])ds} 

=  eP  +  E[exp  {4.P^Q\tQ  -  a]}  -  1]^^^ 

<  2eP  exp  {2P^Q^[^o  —  «]}, 


1/2 


contradicting  the  assumption  that  (8.17)  fails  at  Iq.  Since  for  every  positive  e, 
(8.17)  holds  whenever  (8.10)  does,  \\x{t)  —  a7(/)||  converges  uniformly  to  0  as 
mesh  n  ^  0,  which  completes  the  proof. 

The  only  use  made  of  the  enlarged  class  defined  by  Assumption  7.4  was  to 
guarantee  that  the  coefficients  x)  are  defined.  If  the  x  are  the  Cauchy- 

Maruyama  functions  defined  by  (8.2),  they  are  in  Hij2[T,  P.],  and  we  can  use 
this  for  our  class  abandoning  Assumption  7.4. 


9.  Stochastic  differential  equations  and  related  ordinary  equations 

We  shall  now  revert  back  to  stochastic  differential  equations  like  those  in 
Sections  1  to  3,  in  which  the  coefficients  g^,,  and  so  forth,  are  functions  of  t  and 
x{t)  and  independent  of  earlier  values  of  x{x).  We  suppose  that  these  have  the 
properties  ascribed  to  the  coefficients  Gq  ,  and  so  forth,  as  stated  after  Assumption 
7.3;  but  we  use  g  instead  of  G.  Moreover,  we  take  T  to  be  the  same  as  [a,  6], 
and  y{t)  is  simply  an  initial  value  Xq,  which  is  an  P„  measurable  r.v.  For  such 
equations  we  shall  show  that,  with  the  definition  in  (3.5),  equations  of  the  form 
(3.4)  have  the  stability  property  that  for  a  rather  large  class  of  processes 
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interpolated  in  the  ^  and  having  piecewise  smooth  sample  paths,  the  solutions 
of  (3.3)  with  tend  to  those  with  uniformly  in  near  L2  norm. 

To  avoid  inordinately  long  formulae,  we  change  the  notation  somewhat.  We 
define 

Z^{1)  =  t,  —  CO  <  t  <  cc 

g^(oc)  =  •  •  •  =  g^(x)  =0.  x  e  R"'*'h 


(9.1; 


x^{t)  =  t. 


The  variables  a,  j?  will  always  have  range  {0,  •  •  •  ,  n},  and  p,  cr,  t  will  have  range 
(0,  •  •  •  ,  r}.  A  summation  sign  such  as  or  will  denote  the  sum  over  the 
whole  range  of  that  variable.  Also,  0  will  always  denote  a  Cauchy  partition 
with  notation  (8.1),  and  tj  will  denote  a  division  point  of  0.  An  equation  such 
as  Up  —  Vp  will  always  be  understood  to  hold  for  all  i,  p  in  the  range  of  those 
variables,  unless  some  other  range  is  expressly  specified. 

For  all  x^  in  [a,  6]  and  (a;\  •  •  •  ,  x”)  in  R”  we  define 


(9.2) 


9p,Ax)  = 


ox 


provided  that  the  indicated  derivatives  exist.  Equations  (1.3)  now  take  the  form 


(9.3) 


x\t)  =  ^  f  gi,(x(s))dz^, 

p  Ja 


where  the  initial  value  Xq  is  always  assumed  to  be  an  Fg  measurable  r.v.  The 
analogue  of  (3.4),  with  (3.5),  is 


(9.4)  x\t)  =  ^  f  gi{x(s))  dzr(s)  +  iZ  f  dzf’is)  dz'^is). 

p  Ja  p  Jaa 

This  is  not  identical  with  (3.4),  for  the  last  sum  contains  terms  with  p  =  0  or 
a  —  0,  and  (3.4)  does  not.  However,  by  Theorem  6.2,  all  such  integrals  vanish 
for  all  processes  z^  that  we  shall  consider.  Furthermore,  even  if  p  and  a  are 
positive,  the  definition  (9.2)  contains  a  term  (with  a  =  0)  which  is  lacking  in 

(3.5) .  But  by  (9.1)  this  term  is  0.  So  the  solutions  of  (9.4)  are  the  same  as  those  of 
(3.4)  with  (3.5)  for  all  processes  z^  that  we  shall  permit. 

In  Theorem  7.1,  we  assumed  that  the  coefficients  were  Lipschitzian  in  x{-), 
and  merely  almost  everywhere  continuous  in  t.  To  simplify  proofs,  we  now 
replace  this  by  a  somewhat  unnecessarily  strong  substitute : 

Assumption  9.1 .  The  functions  g*  are  continuously  differentiable  on  the  set  of 
X  with  a  ^  x^  ^  b ;  and  there  is  a  positive  L  such  that,  if  x  and  x"  are  both  in  that 
set, 

\g'p{^')  -  ^ 

WpA^')  -  9p,Ax")\  ^  -  ^"1- 


(9.5) 
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Instead  of  restricting  ourselves  to  linear  interpolation  as  mentioned  in 
Section  1,  we  shall  permit  certain  other  kinds.  Let 

(9.6)  (l>p{t)-.0  ^  t  ^1,  p  =  0,  1,  •  •  •  ,  r, 

be  Lipschitzian  functions  such  that 

(9.7)  (t>,{0)  =  0,  =  1 

and 

(9.8)  (^o(0  = 

Then  for  each  and  each  Cauchy  partition  11 ,  we  define  functions  by  setting 

(9.9)  Z^(<,  m)  =  CO) 

In  particular, 

(9.10)  Z^{t)  =  t,  a  ^  t  ^b. 

We  define 


(9.11)  Jp,„  =  [1  -  <^p(s)]0a(«)  ds. 

Our  principal  stability  theorem,  which  we  now  state,  overlaps  considerably 
with  the  results  of  Wong  and  Zakai  ([8],  [9]).  Although  the  present  methods 
are  different.  Theorem  9.1  obviously  owes  its  existence  to  those  previous  results. 
Besides  this,  the  present  version  of  Theorem  9.1  replaces  an  earlier  version  with 
stronger  h3rpotheses  because  Professor  Wong  pointed  out  the  desirability  of 
improvement. 

Theorem  9.1.  Let  Assumption  9.1  hold,  and  let  the  satisfy  Condition  6.1. 
Let  (I)q,  '  •  •  ,  (pf  have  the  properties  described  above.  Assume  that  for  each  p  and 
(T  in  {0,  1,  •  •  ■  ,  r],  either: 

(i)  to  each  E  >  0  corresponds  a  d  >  0  such  that  if  a  ^  s  ^  t  ^  bandt  —  s  <  6 
then  a.s., 

[Eilz-W  -  g  e(i  -  S), 

E([z'’(,l)  -  z'’(«)]2[z’(()  -  z'’(s)]^|f’,)  g  E((  -  «), 

or  else 

(ii)  =  1/2. 

Then,  as  mesh  11  0,  the  solution  X  of 
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converges  uniformly  in  near  L2  norm  on  [a,  6]  to  the  solution  x  of 


(9.14)  x{t)  =  xo  +  Y  i  Z  f  9p,Mi^)) 

p  J  Cl  p,  ^ 

a  S  t  ^  b. 

Proof.  Observe  that  if  p  =  a,  condition  (ii)  is  satisfied,  while  if  p  =  0  or 
O’  =  0  condition  (i)  holds. 

The  solution  x  of  (9.14)  also  satisfies 


(9.15)  x\t)  =  xj)  +  5]  r  gi,{xis))dzr{s)  +  ^  Jp  „  r  gl^„(x{s))d2f{s)dz'’{s), 

n  Jci  n  ^  Ja 


a  <  t  <  b. 


since  those  integrals  in  (9.15)  with  coefficients  1/2  all  vanish  by  (5.2). 

We  again  define  and  A^z^  by  (8.3).  Let  £  be  positive,  and  let  3  and  A 
correspond  to  £  for  all  the  as  in  Condition  6.1.  Let  11  be  a  Cauchy  partition 
with  mesh  0  <  (5.  For  each  A:  in  {1,  •••,/},  we  define  to  be  the  set  of  co  in 
O  such  that  the  inequalities 

(9.16)  |Aj.2^(co)1  ^  £(AJ./)^/^  p  =  l,---,r 

all  hold  fov  j  =  1,  •  •  •  ,  A:;  then  A^  ^  A.  Corresponding  to  11,  we  now  define  a 
process  x  as  follows.  First,  x{a)  =  Xq.  Then,  x[tj,  co)  having  been  defined,  we 
define 


(9.17)  a:‘(/j+ 1,  o))  =  X'(/j+ 1,  co)  if  oisAj, 

(9.18)  X{tj^^,  CO)  =  xff,  CO)  +  Y^g'pixitj,  (D))^jZP 

p 

+  Z  dp,agp,„(x{tj,  0}))^^zP^jZ^  if  CO  e  a  -  Aj. 

p,a 

In  either  case  we  define 

(9.19)  x\t,  co)  =  x*(tj,  co)  tj  ^  t  <  tj+i. 

The  set  Aj  defined  by  (9.16)  is  measurable,  and  Z^{t)  is  a  linear 

function  of  z^{tj)  and  2^(fj+i)  for  tj  ^  t  ^  tj+i  -  so  by  (9.13)  X(tj+i)  is  a  con¬ 
tinuous  function  of  the  z^{tf,)  for  h  =  ,  j  +  1,  and  is  F[fj  +  i]  measur¬ 
able.  So  by  (9.17)  and  (9.18),  x(/j+j)  is  measurable,  and  hypothesis  (h) 

is  satisfied.  So  is  (f ) ;  and  (8.4)  follows  readily  from  (9.19)  and  Condition  4.1. 

If  CO  E  Aj,  from  (9.17)  we  obtain  by  integration  by  parts  in  (9.13), 

(9.20)  x\tj+^)  -  x\tj)  -  Y,g'p(x{tj))XjZf’ 

p 

-  Z  9p,A^(^j))  f'"  [^^((j+i)  -  Z^(s)]z^(s)  ds 

P,<T 

=  z  -  z%^)]z'^(^‘)ds. 

P,<T  ‘'0 
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For  the  rest  of  this  proof  C2,  and  so  forth,  will  denote  positive  numbers 
whose  values  are  determined  by  the  numbers  n,  r,  K,  L,  gp{0)  and  sup  |0p(<)l ;  we 
omit  the  easy  but  uninspiring  computation  of  the  expressions  for  the  Ci . 

For  t  in  [fj,  + 1]  and  od  in  Aj,  we  define 

(9.21)  N{t,  co)  =  sup  {|X(t,  co)  —  x{tj,  m)|:  tj  ^  r  ^  /}. 

Then  by  Assumption  9.1, 

(9.22)  \9p(X{t,  0)))!  ^  |gr‘ (0)1  +  \gi,(x{tj,  m))  -  gr' (0)| 

+  ft)))  -  gi,(x{tj,  ft)))| 

^  Cl  +  L\x{tj,  ft))|  +  LN{t,  co). 

Hence  by  (9.17)  and  (9.16), 

(9.23)  |X'((,  m)  -  x%,  0))|  g  X  f  |9i(X(s))p‘'(s)|  ds 

^  e[C2  +  C3\x{tj,  ft))|  +  C^Nit, 

This  remains  valid  if  in  the  left  member  we  replace  t  by  any  number  t  in  [/j,  ^], 
so 

(9.24)  N{t,  CO)  ^  £[C5  +  C^lxitj,  ft))|  +  C^Nit,  ft))](Aj.O^'^ 

Since  C7  does  not  depend  on  e,  we  may  and  shall  restrict  our  attention  to  e 
such  that 

(9.25)  0  <  8  <  ^(6  -  a)-*'^ 

Then  from  (9.24),  we  obtain 

(9.26)  N{t,  ft))  ^  2£[C5  +  Ce\x{tj,  ft))|](A,.0^^^ 

From  this,  with  (9.20)  and  (9.21), 

(9.27)  |:r‘(<j+i)  -  x\tj)  -  Y,9p(x{tj))^jzf^  -  Z 

p  p.o 

^  (Cg  +  C9|:r‘(fj.)|)£A/. 

If  ft)  €  Q  —  Aj,  this  is  trivial;  the  left  member  of  (9.27)  is  0  by  definition. 

By  (9.27), 

(9.28)  \\x{tj+i)  -  x{tj)  -  -  Z 

p  p,a 

^  ||C8£A^||  +  ||C9£Afcr(/j-)|l ; 

so  (8.5)  is  satisfied,  and  by  Theorem  8.1,  x  converges  to  x  uniformly  in  L2  norm 
as  mesh  n  -►  0. 

If  ft)  €  Aj,  x{tj)  was  defined  to  be  X{tj),  and  by  (9.21)  and  (9.26), 

X‘(()  -  x‘(()|  s  26[C'5  +  Cs|x'(/)|](A/)‘'". 


(9.29) 
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Since  x  converges  uniformly  in  L2  norm  to  x,  its  L2  norm  is  bounded,  and  (9.29) 
implies  that  if  mesh  n  is  small, 


(9.30) 


|X(^,  m)  —  x{t,  (i))\^P{d(jo) 


1/2 


^  (7ioe(mesh  11)^^^ 


This,  with  the  uniform  convergence  of  a:  to  a:  in  L 2  norm,  shows  that  X  tends  to 
X  uniformly  in  near  L2  norm  as  mesh  fl  tends  to  0. 


10.  Stability,  and  its  limitations 

Theorem  9.1  can  be  regarded  as  a  statement  about  stability  of  the  solutions 
of  (9.14).  Since  the  last  set  of  stochastic  integrals  in  (9.14)  vanish  for  Lipschitzian 

and  z^,  the  theorem  informs  us  that,  on  the  family  of  disturbances  2/^  con¬ 
sisting  of  one  process  satisfying  Condition  4.1  and  Lemma  6.1  together  with  all 
processes  interpolated  in  the  z^  in  accordance  with  Theorem  9.1,  the  solutions 
of  (9.14)  depend,  in  a  stable  or  continuous  manner,  on  the  disturbances.  By 
estimating  the  closeness  of  all  approximations  we  could  extend  this  to  a  larger* 
collection  of  2^,  all  satisfying  Condition  4.1  and  Condition  6.1  with  the  same 
constants,  together  with  all  disturbances  interpolated  in  them  as  in  Theorem  9.1. 

It  would  be  desirable  to  permit  another  kind  of  interpolation  often  encoun¬ 
tered  in  applications,  in  which  the  2^  are  approximated  by  functions  that  co¬ 
incide  with  2^  at  evenly  spaced  and  have  derivatives  whose  Fourier  transforms 
vanish  outside  some  finite  interval.  Theorem  9.1  gives  us  a  feeble  substitute  for 
this.  Let  T  be  infinitely  differentiable  and  nondecreasing  on  (  —  go,  00),  with 
^(f)  =  0  if  f  ^  0  and  T’(/)  =  1  if  /  ^  1.  We  choose 

(10.1)  =  (/)p(0  =  T(0  p  =  l,...,r, 

and  for  each  fl ,  we  interpolate  Z^  in  zf  using  the  and  extend  Z'’  by  constancy 
on  ( —  00,  a]  and  [6,  00).  Then  the  Z^  have  Fourier  transforms  that  tend  to  0  at 
+  00  faster  than  any  negative  power  of  the  independent  variable,  and  the  Z'' 
satisfy  the  requirements  of  Theorem  9.1. 

For  the  case  n  =  r  =  I,  with  2^  a  Wiener  process,  Wong  and  Zakai  [9]  have 
proved  a  theorem  that  shows  the  possibility  of  using  Z'^  whose  derivatives  have 
bounded  spectra.  Omitting  superscripts  i  and  p,  let  Zj,  Z2,  •  •  •  ,  be  a  sequence 
of  processes  on  [a,  6]  such  that  with  probability  1,  Z„{t,  co)  tends  pointwise  to 
z(t,  co)  and  has  a  bounded  piecewise  continuous  derivative,  and  such  that  there 
are  finite  valued  processes  Hq,  k  such  that  a.s. 

(10.2)  \Zn{t,  (u)|  ^  k{(a),  a  t  ^  b,  if  n  no((jo). 

Wong  and  Zakai  then  showed  that,  under  essentially  the  same  hypotheses  on 
Qi  and  Qii  as  in  Theorem  9.1  the  solutions  X  of  (9.13)  converge  almost  surely  to 
the  solution  x  of  (9.14)  (in  which  we  can  replace  dzdz  by  dt,  by  (6.3)). 
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This  theorem  does  not  extend  to  the  case  n  =  r  =  2,  even  when  and 
are  independent  Brownian  motions,  as  the  following  example  shows.  Consider 


the  stochastic  differential  equations 

(10.3) 

x^{t)  =  f  dz^{s), 

Jo 

x^{t)  =  f  x^{s)  dz^{s), 

Jo 

0  ^  ^  1, 

in  which  z^  and  z^  are  independent  standard  Wiener  processes, 
be  infinitely  differentiable  nondecreasing  functions  on  (  —  oo ,  oo 

Let  ifi  and  if/ 2 
)  such  that 

(10.4) 

ip{t)  =  j 

© 

IIV  IIA 

(10.5) 

ip{t)  =  j 

VII  All 

o  ^ 

Then 

r  [1  -  iAi(s)]j^2('S)  =  0, 

Jo 

(10.6) 

r  [l  -  ij/2{s)']ii/iis)ds  =  l. 

Jo 

Given  a  Cauchy  partition  ft.  with  the  usual  notation  (8.1),  for  p  =  1.  2  and 
for  tj  ^  t  ^  tj+ 1 ,  we  define 

(o)  =  zP{t,  m)  + 

.  ft  -  /A 

Z^{t,  CO)  =  Z^(t,  CO)  + 

if 

(10.8)  AjZ^{co)AjZ^{co)  ^  0, 

and  we  define  Z^(/,  co)  and  Z^(t,  co)  by  (10.7)  with  the  right  members  inter¬ 
changed  if  (10.8)  is  false.  We  extend  Z^  and  Z^  by  constancy  on  (  —  go,  a]  and 
on  [6,  00 ).  Then  Z^  and  Z^,  p  =  1,2,  are  infinitely  differentiable,  have  the  same 
bounds  as  zf^,  and  with  probability  1  converge  to  uniformly  on  [a,  6].  By 
(9.20)  and  (10.6),  for  the  corresponding  solutions  of  (10.3)  we  have 


)  -  ^{tj) 

=  AjZ*, 

(10.9) 

-  X%) 

=  X\tj)AjZ^ 

and 

1) 

-  X^itj) 

=  Ajz\ 

(10.10) 

1 ) 

-  x^itj) 

=  X\tj)AjZ^ 
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Define 

(10.11)  -  XP,  p  =  1,  2; 

then  from  (10.9)  and  (10.10) 

(10.12)  -  iUlj)  =  0. 

(10.13)  -  eitj)  =  + 1^2' VI- 

From  (10.12),  —  0  for  all  so  by  (10.13) 

(10.14)  ^^(1)  =  XIV‘V"I- 

J 

Since  the  and  are  independent  normal  r.v.,  XjZ^  having  mean  0  and 
variance  Xjt,  we  readily  compute 

,  ,  ,  „  2A,/  2 

(10.15)  E  ^^(1)  =  I— ^ 

j-  71  71 

and 

(10.16)  Var  4^(1)  S  [A,.2']"[Aj2"]") 

j 

=  I 

j 

which  tends  to  0  with  mesh  0.  So  X^(l)  —  X^(l)  tends  in  L2  norm  to  2/7r  as 
mesh  n  tends  to  0,  and  it  is  impossible  that  X^(l)  and  X^(l)  both  tend  to  the 
same  limit  x^(l),  a.s.,  or  even  in  probability. 

The  example  shows  the  inherent  limitations  on  stability  of  models.  With  such 
a  simple  system  as  (10.3),  when  mesh  FI  is  small,  the  results  of  linear  inter¬ 
polation  in  zf  and  of  the  interpolation  (10.9)  in  z^  will  have  differences  that  are 
uniformly  arbitrarily  small  for  almost  all  (o.  Yet  the  solutions  X  of  the  ordinary 
equations  (10.4)  and  (10.5)  corresponding  to  those  two  practically  indistinguish¬ 
able  disturbances  will  not  be  arbitrarily  close  to  each  other  in  L2  norm.  Hence, 
no  “selection  principle”  can  possibly  provide  a  model  that  is  consistent,  in¬ 
clusive  enough  to  include  Lipschitzian  processes  and  Brownian  motions,  and 
so  thoroughly  stable  as  to  yield  practically  indistinguishable  solutions  corres¬ 
ponding  to  practically  indistinguishable  disturbances.  The  limited  stability 
described  in  Theorem  9.1  may  be  about  as  much  as  we  can  attain. 

Perhaps  we  are  studying  the  problem  from  the  wrong  end.  As  mentioned  in 
Section  1,  if  we  wish  to  stay  in  the  domain  of  trustworthiness  of  classical 
scientific  theories,  we  should  hold  to  Lipschitzian  disturbances.  Idealizations 
to  martingales  are  made  for  mathematical  convenience,  and  they  depart  from 
the  Lipschitzian  case  so  far  that  no  martingale  can  have  a.s.  Lipschitzian  sample 
paths  unless  the  sample  paths  are  a.s.  constant  (see  Fisk,  [3]). 

In  Theorem  9.1,  and  in  the  theorems  of  Wong  and  Zakai,  the  idealization  is 
the  starting  point,  and  it  is  approximated  by  the  Lipschitzian  Z^.  Since  it  is  the 
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Lipschitzian  case  that  is  presented  to  us  by  the  outside  world,  it  would  seem 
more  significant  to  find  how  well  we  can  approximate  the  processes  of  classical 
theory  by  our  idealizations,  rather  than  the  reverse.  But  this  would  appear  to 
be  a  ditficult  undertaking. 


11.  A  Runge-Kutta  type  of  approximation 

The  Cauchy  (or  Euler)  polygons  are  useful  in  the  theory  of  ordinary  differential 
equations,  but  for  computation  they  are  much  inferior  to  the  Runge-Kutta 
approximations.  As  adapted  to  equations  (9.3),  this  method  can  be  described 
thus.  Given  a  partition  11  (with  the  usual  notation  (8.1)),  we  define  y{a)  —  Xq, 
and  then  define  yiti),  '  '  '  successively  as  follows.  From  y{tj)  we  first  compute, 
as  in  (8.2),  the  value  of 

(11-1)  y\ij)  + 

p 

But  instead  of  using  the  corresponding  to  these  sums  as  coefficients  for  the 
next  step,  as  in  the  Cauchy- Maruyama  process,  we  average  them  with  the 
gp{y{tj))  to  furnish  a  second  approximation  to  the  values  of  the  g'p  for  use  in 
estimating  y{tj+^).  Thus,  we  have 

(11.2)  y{tj^i)  =  y{tj)  +  i  X 

p 

+  I'Lgplyih)  + 

P  o 

The  values  of  y  at  points  interior  to  intervals  4-i- 1]  of  secondary  interest; 
we  could,  for  example,  define  them  by  linear  interpolation. 

The  preservation  of  a  formula  or  an  algorithm  is  a  much  less  basic  stability 
property  than  that  discussed  in  the  preceding  section.  Nevertheless,  it  is  to  some 
extent  significant,  as  well  as  computationally  convenient,  that  if  we  try  to 
approximate  solutions  of  (9.3)  by  the  Runge-Kutta  method  for  processes  satis¬ 
fying  the  hypotheses  of  Theorem  9.1  and  one  more  continuity  condition 
Assumption  11.1,  the  approximations  will  converge,  not  to  the  solution  of  (9.3), 
but  to  the  solution  of  (9.14).  This  in  a  sense  gives  added  recommendation  to  our 
“selection  principle.”  But  besides  this,  it  permits  us  to  use  a  well-known  com¬ 
putation  procedure  to  approximate  the  solution  of  (9.14),  whether  the  are 
Lipschitzian  or  are  martingales  or  any  other  processes  satisfying  the  hypotheses 
of  Theorem  9.1,  without  having  to  interpolate  to  find  the  Z'’  and  without  having 
to  solve  equations  (9.13). 

Equation  (11.2)  may  be  regarded  as  the  first  step  in  the  iterative  solution  of 
y{tj+i)  =  y{tj)  +  iZ  {9ply{tj)'\  +  9ply{tj+i)'\}^j^- 


(11.3) 
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To  guarantee  the  convergence  of  such  an  iterative  process,  it  is  desirable  and 
usual  to  make  assumptions  that  guarantee  that  the  successive  corrections  form 
a  diminishing  sequence.  One  such  assumption,  for  the  present  problem,  is  the 
following. 

Assumption  11.1.  There  are  positive  numbers  such  that  if  and  X2 

are  points  of  with  x^  in  [a,  6]  and  \x2  —  x^  I  ^  ^1(1  +  I  XjI),  then 

(11-4)  \glAxi)  -  9p,x4oCr)\  ^  -t2\ 

This  rather  strong  uniform  continuity  requirement  will  be  further  discussed 
after  proving  the  next  theorem. 

Theorem  11.1.  Let  the  and  satisfy  the  hypotheses  of  Theorem  9.1,  and 
also  satisfy  Assumption  11.1.  For  each  Cauchy  partition  11  of  [a,  6],  let  y  he  the 
process  determined  by  the  Runge-Kutta  process  (11.2),  with  linear  interpolation 
between  the  division  points  of  0.  Then  as  mesh  11  — >  0,  y  converges  on  [a,  6] 
uniformly  in  near  L2  norm  to  the  solution  x  of  (9.14). 

Proof.  Let  e  be  positive,  and  let  d  and  A  correspond  to  £  for  all  the  z''  as  in 
Condition  6.1.  We  define  the  sets  Aj,  •  •  •  ,  as  in  the  sentence  containing  (9.16) 
and  we  define  a  process  x  corresponding  to  n  as  follows.  First,  x{a)  =  Xq-  Next, 
x{tj)  having  been  defined,  we  define  x{tj+i)  by 

(11.5)  x{tj+i.  co)  =  yitj+i,  co)  if  o)  s  Aj, 

(11.6)  Atj+i-  w)  =  xitj,  CO)  +  (o))Ajz'’ 

p 

+  i  Z  (o))AjZ^Ajz‘’ 

P,(T 

if  m  6  Q  —  Aj. 

Finally,  we  define 

(11.7)  x\t,  (o)  =  A{tj,  m),  tj  ^  t  <  tj+i. 

It  is  easy  to  verify  that  these  x  satisfy  conditions  (f),  (h),  and  (8.4).  For  co  in 
Aj,  by  applying  the  theorem  of  the  mean  to  (11.2),  we  find 

(11.8)  y\tj+i)  =  yftj)  +  hY.^pi:y^h))A^ 

p 

+  \  Y.Wp{y{h))  + 

p  o 


f]‘  =  y\tj)  + 

T 


where 

(11.9) 
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for  some  0j(co)  in  (0,  1).  As  in  Section  9,  we  use  C^,  C2,  and  so  forth,  to  denote 
numbers  determined  by  the  data  of  the  problem.  Then,  by  Assumption  11.1, 

(11.9)  implies 

(11.10)  \gl,Afi)  -  9i>Ayih))\  ^  (^1  + 

From  this  and  (11.8),  since  ojeAj, 


(11.11)  \x\tj+A  -  x\tj)  -  i  Z 

P  P,a 

^  (C'a  +  C^\x{tj)\)E^Ajt. 

This  also  holds  if  m  g  —  Aj,  since  then  the  left  member  is  0  by  definition.  By 
taking  the  expectation  of  the  square  of  the  left  member  of  (11.11),  we  find,  as  in 
(9.28),  that  (8.5)  also  is  satisfied. 

Now  by  Theorem  8.1,  x  converges  to  the  solution  x  of  (9.14)  uniformly  in 
L2  norm,  as  mesh  11  0.  Since  x  is  continuous  in  L2  norm  and  the  x{tj)  are 

uniformly  close  in  L2  norm  to  x{tj)  at  all  division  points  of  fl  if  mesh  11  is  small, 
it  is  easy  to  see  that  if  we  modify  x  by  retaining  its  values  at  the  tj  but  inter¬ 
polating  linearly  between  them,  the  modified  process  also  converges  to  x  uni¬ 
formly  in  L2  norm.  But  this  modified  process  coincides  with  the  Runge-Kutta 
process  y  for  all  co  in  A,  and  so  the  proof  is  complete. 

Assumption  11.1  is  strong,  but  from  an  experimental  or  computational  point 
of  view  it  can  be  tolerated.  Ordinarily  there  will  be  some  bound  B  on  the  norms 
of  the  X  that  interest  us.  In  an  experiment,  points  with  |x|  >  R  will  make  the 
points  too  far  away  to  be  involved  in  the  process  under  investigation ;  in  com¬ 
putation,  B  could  be  a  bound  on  the  numbers  within  the  machine’s  capacity.  If 
we  replace  the  g'),(x)  by  other  functions  G^x)  that  satisfy  Assumption  11.1  and 
coincide  with  g'p(x)  whenever  |x|  ^  /?,  the  solution  of  (9.14)  with  coefficients 
G'p  will  coincide  with  the  solution  of  (9.13)  as  written  unless  the  solution  some¬ 
where  has  norm  greater  than  B :  and  unless  the  probability  of  this  is  negligibly 
small,  we  face  worse  troubles  than  the  mere  nonconvergence  of  the  Runge-Kutta 
procedure. 

Professor  H.  Rubin  informs  me  that  Dr.  Donald  Fisk,  in  his  doctoral  dis¬ 
sertation  at  Michigan  State  University,  has  defined  and  studied  a  stochastic 
integral  which  is  the  limit  of  “trapezoidal  rule”  approximations,  the  values  of 
the  integrand  at  the  beginning  and  end  of  each  interval  \tj,  1]  being  averaged. 
(Professor  Rubin  also  furnished  reference  [2].)  Existence  theorems  are  estab¬ 
lished  for  integrals  \  fdz  in  which  z  is  a  quasi-martingale  (see  Fisk,  [3]).  I  have 
not  had  the  opportunity  of  seeing  this  dissertation,  but  it  is  evident  that  the 
application  of  Fisk’s  integral  to  differential  equations  (1.3)  must  be  closely 
related  to  the  procedure  described  in  Theorem  11.1,  and  even  more  closely 
related  to  the  process  mentioned  in  (11.3). 
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